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1. Introduction 


The digital era in computing today, has been greatly affected by a huge desire to protect, store and access data 
due to the fact that the usage of computer systems have quadrupled over the years and as a result, given birth 
to Big Data analysis. Data is growing faster than ever before and by the year 2020, about 1.7 megabytes of 
new information will be created every second for every human being on the planet .' The number of social media 
users worldwide from 2010 to 2016 with projections until 2020 has exponentially increased, and in 2018 alone 


consisted of an estimated 2.67 billion social media users around the globe, up from 2.34 billion in 20162 


As a result, the research question “To what extent would Depth first search or Breadth first search be suitable 
for search in graph data structures used by social networks, taking time and memory as determining factors in 
java”, seeks to investigate which algorithms used in graph data structures would be better suited for search in 
data structures implemented by social networking sites like Facebook, Twitter, Google plus etc. as there are 
more users on these platforms every day. This research is quite important because, users of such social 
networking platforms, perform searches to connect with more people on such networks daily. Thus, the speed of 
the search as well as the manageability of the search algorithms that make these features available, is very 


important to the user and provider respectively. 


Data management interventions and speed to access data has been craved for in recent years, and in a bid to 
efficiently help manage all and any type of data (ranging from numerical values to Objects) being produced, 


computer scientists produced the concepts of data structures such as arrays, lists, linked lists, queues, stacks, 


! Marr, Bernard. *Big Data: 20 Mind-Boggling Facts Everyone Must Read." Forbes, Forbes 
Magazine, 19 Nov. 2015, www.forbes.com/sites/bernardmarr/2015/09/30/big-data-20-mind- 
boggling-facts-everyone-must-read/#62e17e3c17b1. 


+ all products require an annual contract. Prices do not include sales tax (New York residents 
only). “Number of Social Media Users Worldwide 2010-2021.” Statista, 
www.statista.com/statistics/278414/number-of-worldwide-social-network-users/. 


trees and graphs. Computer scientists also came up with sequential ‘step by step’ ways through which such data 


can be processed, managed and then churned into information while stored in such data structures. 


1.1 Big Data with Reference to Social Networks 

The word “Big Data” comes from the concept that describes one dataset whose size exceeds the typical 
database software acquisition and storage, management and analysis. There are comprehensively three broad 
categories of Big data that is Structured data, Semi-structured Data and Unstructured data. Structured data is 
data that is contained in a field and saved in a record or file. A common example of this is data that is stored in 
a relational database or a log file. Conversely, unstructured data can be understood as not being organised in a 
particular way. An important example of this will be content on a social media site, such as Facebook. This 
content would include images, videos, text and advertisements to name a few, but the content will usually be too 
difficult to organise to be stored in a database or log file.? The data used by any Social network architecture, 
could fall into any of such three categories depending on the module being used by the organization needing 


the data and for their own purpose. 


1.2 Why is data from Social networks classified as Big data? 


There are many common features of Big Data, otherwise known as "V" features coined by different researchers 


which stand for Volume, Variety, Velocity, Veracity. 


Volume: The amount of data being generated by social networks is growing every day. Examples include, new 


friendships between users as in Facebook and new face book groups created by users (known as clusters in the 


i Beal, Vangie. "Structured Data." What Is Structured Data? Webopedia Definition, 
www.webopedia.com/TERM/S/structured data.html. 


face book graph), new followers and retweets as in twitter and finally new profile accounts being created by 
people joining these social networks to mention a few. Such logistical processes generate large gamut of data 


about human beings who patronize such social networks. 


Variety: In a bid to provide the best form of socialization and entertainment, social networks have integrated 
variety of data - both generated by their users and the organization itself - forms for their users to enjoy. These 
different types of data include: image, audio, video, diagram and others. They are derived from different sources 
like GPS signals, sensors, ad- hoc network, social networks and many more that capture data and information 


updates. 


Velocity: For Social networks, data is continuously generated at every time but only useful data are needed for 
the processing to give effective information. ‘Velocity’ refers to the increasing speed at which this data is 
created, and the increasing speed at which the data can be processed, stored and analysed by relational 
databases. The possibilities of processing data in real-time is an area of particular interest, which allows 
companies to do things like display personalised ads based on the on the web pages you visit, based on your 
recent search, viewing and purchase history as in face book where videos shown on your timeline are based on 


your recent account activities. 


Veracity: Although there’s widespread agreement about the potential value of Big Data, data is virtually 
worthless if it's not accurate. This is particularly true in programs that involve automated decision-making, or 
feeding the data into an unsupervised machine learning algorithm. The results of such programs are only as 
good as the data they're working with. Social Networks produce Big data due to the fact that majority of the 
data provided by users is almost always authentic as these users are looking forward to reap the full benefits 


from registering in these networks. 4 


* McNulty, Eileen, et al. “Understanding Big Data: The Seven V's." Dataconomy, 8 May 2017, 
dataconomy.com/2014/05/seven-vs-big-data/. 


2. Data structures 


A Data Structure is a way of collecting and organising data in such a way that we can perform operations on 
these data in an effective way — It is a storage place for data. The concept of Data Structures is about rendering 
data elements in terms of some relationship, for better organization and storage. Data Structures are structures 
programmed to store ordered or unordered data, so that various operations can be performed on it easily in 
order to churn such data into information. It is normally designed and implemented in such a way that it reduces 


the complexity and increases the efficiency. 


Anything that can store data can be called as a data structure. Hence, Integer, Float, Boolean, Char etc., all are 
data structures. They are known as Primitive Data Structures. However, there are also some complex Data 


Structures called Abstract Data structures, which are used to store large and connected data. Examples are: 


e linked List 
e Tree 
e Graph 


e Stack, Queue etc. 


All these data structures allow us to perform different operations on data. Data structure selection is normally 


based on what type of operation is required for what kind of data being generated. 


2.1 Graph data structures 
Graphs are one of the most interesting data structures in computer science. Graphs and the trees are somewhat 
similar by their structure as they are both considered abstract data structures. The tree data structure is derived 


from the graph data structure. However, there are two important differences between trees and graphs: 


1. Unlike trees, in graphs, a node can have many parents. 


2. The link between the nodes may have values or weights. 


The following example shows a very simple graph that can be abstractly represented on the computer: 


In the above graph, A, B, C, D, E, F are called nodes and the connecting lines between these nodes are called 
edges. The edges can be directed edges which are shown by arrows; they can also be weighted edges in which 
some numbers are assigned to them. Hence, a graph can be a directed/undirected and weighted/un-weighted 


graph. 5 


3 Bijulsoni. “Introduction to Graph with Breadth First Search(BFS) and Depth First Search(DFS) 
Traversal Implemented in JAVA." Introduction to Graph with Breadth First Search(BFS) and 
Depth First Search(DFS) Traversal Implemented in JAVA - CodeProject, 
www.codeproject.com/Articles/32212/Introduction-to-Graph-with-Breadth-First-Search-BF. 
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An example of a directed graph on 4 vertices. An undirected graph on 4 vertices! 


Since they are powerful abstractions, graphs can be very important in modelling data. In fact, many problems 
can be reduced to known graph problems considering the fact that the nodes in the graphs could be 
represented as objects, and the edges as relationship objects. 


Some real-world solutions that use graph data structures include: 


1. Social network graphs: Here, we find Graphs that represent who knows whom and who communicates 
with whom. An example is the twitter graph of who follows whom. Such graphs can be used to determine 


how information flows, how topics trend etc. 


2. Transportation networks. In road networks vertices are intersections and edges are the road segments 
between them, and for public transportation networks vertices are stops and edges are the links 
between them. Such networks are used by many map programs such as Google maps and Bing maps. 
They are used to find the best routes between locations, and also used for studying traffic patterns, 


traffic light timings, and many aspects of transportation. 


3. Social network Graphs 

A social graph is a diagram that illustrates interconnections among people, groups and organizations in 

a social network. The term refers to both the social network itself and a diagram representing the network. 
Individuals and organizations, called actors, are nodes on the graph® For the purposes of this research, | am 
going to discuss graph structures used by social networks and then analyse two algorithms that provide 


information from the data stored in the graph. 


3.1 Why do social networks use Graph data structures? 


What makes graphs special is that they represent relationships between things from the most abstract to the 
most concrete e.g., mathematical objects, things, events. A social network is an umbrella with nodes of 


individuals, groups, organizations and related systems that tie in one or more types of interdependencies. 


However, the social network and type of graph entirely depends on the architecture that is being used by the 
company providing the online social network services — That is, for a social network that is going to be 
bidirectional in terms of the relationship between nodes, an undirected graph structure is going to be used as in 
the Face book graph. Conversely, if the relationship between nodes is going to be one directional, then a 
directional graph is used. Social network analysis is focused on uncovering the patterning of people's 


interaction. A major habit of most users on social network involves the feature of Search. 


6 «What Is Social Graph? - Definition from Whatls.com." WhatIs.com, 
whatis.techtarget.com/definition/social-graph. 


3.2 Graph Traversals 


In the last years, huge graphs with billions of vertices and edges have become very common because accounts 
are being created every day, more relationships are growing between users and finally, more data is being 


produced by each node in the network. 


Search in graph terminology is known as graph traversal, and is a means of visiting every vertex and edge 
exactly once in a well-defined order. There are two basic graph traversals which are Breadth first search 
(BFS)and Depth first search (DFS). These algorithms as their names denote, are coined from the manner in 


which both traverse(search) a graph. 


Breadth first search 

Breadth first search is an algorithm used in traversing graphs where traversal is done from a selected node 
called the source node and then layer wise thus exploring the neighbour nodes. The algorithm then moves 
towards the next-level neighbour nodes. In other words, it explores the neighbours of the neighbours of a 
particular node while moving towards the next-level nodes. 


So, first move horizontally and visit all the nodes of the current layer, then move to the next layer. 


eee 
6000 


Looking at the graph above, bfs first visits the source node which is 0, and immediately visits its immediate 


neighbours all in the same layer 1, 2 , and 3 before moving to the next layer to explore the others. The distance 
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between the nodes in layer 1 is comparatively lesser than the distance between the nodes in layer 2. Therefore, 


in BFS, you must traverse all the nodes in layer 1 before you move to the nodes in layer 2. 


BFS uses a queue used to store a node and mark it as 'visited' until all its neighbours (vertices that are directly 
connected to it) are marked. The queue follows the First In First Out (FIFO) queuing method, and therefore, the 


neighbours of the node will be visited in the order in which they were inserted in the node. 


Depth first Search 
Depth first search algorithm is a recursive algorithm that uses the idea of backtracking. lt involves exhaustive 
searches of all the nodes by going ahead, if possible, else by backtracking.? All the nodes will be visited on the 


current path till all the unvisited nodes have been traversed after which the next path will be selected. 


DFS 
e e e 
e o©o— øo o— o o 
o o o o o o 
e e 
e o e eo 


DFS is implemented using stacks. A starting node is picked and all its adjacent nodes are pushed into a stack. 


A node is then popped from stack to select the next node to visit and all its adjacent nodes are pushed into a 


7 Here, the word backtrack means that when you are moving forward and there are no more 
nodes along the current path, you move backwards on the same path to find nodes to traverse. 
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stack. This process is repeated until the stack is empty. However, the nodes that are visited have to be marked 


else a node might be visited more than once, and an infinite loop would occur.® 


Methodology and Experimental Setup 


Concerning the memory used by both algorithms, | hypothesize that BFS would use significantly more memory 
than DFS because in its execution, it selects a vertex, inserts its entire adjacent vertices in a queue, then takes 
another vertex from memory and also inserts its entire adjacent vertices into the same queue without deleting 
any. However, DFS does the same insertion and deletion as BFS, but it removes a node from memory once its 
descendants have been expanded. | also hypothesize that there might not be a dominance of either algorithm 


with regards to time as the search might depend on other factors that might be beyond the scope of this paper. 


| plan to conduct an experiment in java using an external library called "Jgrapht" - a popular library used for 
implementing graph data structures and algorithms. | am going to be implementing a social network graph in java. 
This graph is extracted from Facebook and consists of people (nodes) with edges representing friendship ties, 
based on students from Colleges in the United States of America. These Colleges Include California Institute of 
Technology, Reed University, Haverford College, Swarthmore College, Middlebury College, Bucknell College, John 
Hopkins University and Massachusetts Institute of Technology. The data for these colleges can be found in the 


appendix. 


5 «Breadth First Search Tutorials & Notes | Algorithms." HackerEarth, 
www.hackerearth.com/practice/algorithms/graphs/breadth-first-search/tutorial/. 


| plan to use a mac book air 13’ whose operating system is MAC OS X 10.12.5, has a total of 4 processors and a 
total physical memory of 8 Gigabytes. My Integrated development environment (IDE) will be the Eclipse neon.3 


because that is what | am most proficient in using as well as the fact that it is a suitable IDE for java programming. 


First off, | plan to implement a model graph that has the same number of nodes and relationships as the dataset 
| obtained but with randomized relationships between the nodes. | am using the same number of nodes and 
edges, but having randomized edges (relationships between the nodes) because it will be quite cumbersome to 


implement, for example, a total of 6400 nodes and 251,200 edges(relationships) “manually”. 


Nevertheless, because of the randomised nature of the graph | will initially produce, the relationships between 
nodes might change every time the program is run thus, | plan to write the generated graph as a serialised object 
to a file and then search for a particular node in each graph. | plan to do this in order to ensure that the graph 


both algorithms traverse is constant to eliminate any disparities that may arise while collecting data. 


| will then implement both DFS and BFS libraries using the Jgrapht library and then measure the time and memory 
used by both algorithms by finding initial and final time and memory space, and then subtracting to find the 


difference. | plan to use inbuilt java functions to find out the memory and time statistics of both algorithms. 


Figure 1 


pélie static chase Rondorrephcretort { 


The method createRandomUndirectGraph () graph creates an undirected graph that has its nodes as Int objects who 


are randomly connected. However, the number of edges and nodes as parameters for the method are derived from 


the real-world dataset of colleges in Table 1. 


Name of College [V] (No. of nodes in Graph) | E| (No. of edges in Graph) 
California Institute of Technology 769 17000 
Reed University 962 19000 
Swarthmore College 2000 61000 
Middlebury 3000 125000 
Bucknell College 4000 159000 
John Hopkins University 5000 187000 
Massachusetts Institute of 6000 251000 

Technology 


The dfsSearch() and bfsSearch()methods — shown in figures 2 and 3 - were implemented with the help of Jgrapht 


and traversed the graph to find a specific node which was randomly generated and recorded as 745. 


private static <Int void bFsSearcn(Graphent, Defaulttdge> grosh, Int search) 
rapriteratarcint, Defaultédges iterator - new Depth irstTeeratorInt, 


H 
Seite (sterotor haahontO) í 


í 
Defeat tEdpeCarooh); 


AF CHteretor.noxO-eustsCseoreh) € 


Systen.err-printlnWot what you are Lacking For “X; 
se C 


‘Shsten.out.printin"ound search in graph: 
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In order to get data to do with the time and memory used during the execution of a particular search, the java 


code in figure 4 below was implemented. 


30 Graph = ReadGraph(Graph); 
System.gcO; 
Thread. steep(5000); 


e long InitiolMemory = runtime.totalMemory(); 
7 long begin < System. currentTimeMillis(); 


9 bfsSearch(Graph, 745); 


1 long endd - System. currentTimeMillis(); 
long FreeMenory = runtime, FreeMenory(); 


long usedMenory < CInitialMemory - freeMemory); 
46 long time = endd - begin; 
System.out.printinC(usedMemory) / 1024 * 1824 + "mb"; 


System.out.printinCtime + "willisecs"); 


u fig. 4 


| decided that after Reading the graph from the file, | needed to ensure that all unused objects generated from 
the reading of the file were deleted from memory, so that it doesn't affect the memory measurements | get. Thus, 
| run the Garbage collector explicitly. | then took measurements for memory and time by wrapping them around 
the search method as shown in figure 4 above. The memory and runtime of both algorithms time were found with 


the help of the java bench marking class Runtime. 


An example of the whole Program — showing both writing and reading of the Caltech Graph - was executed as 


shown below for all the Colleges and their respective number of nodes and edges (relationships). 


package eei 
—— ——ü 2 public closs jyrephtest { 
" = public static void nain(StringL] orgs) throws IOException, ClossotécundExcept 
S Rüntine runtime = Runtiee.gethintieeQ), 
15 public class jgraphtest ( Lia o terete, efr Gr = mat 


public static void moin(StringL] args) throws IOException, ClosshetFoundException { ‘Systen.out.printin("The node thot will be Looked for ia " à "HS 


Rordosirogh(reatoel RowGraph = new RandontraphCreotori e qoos 


€ 


Sysken, ut, printinC" Making a Coltech Graph : 749 nodes , 17K edges "J; — 


Grophelnteger, Defoultkige» Graph = Srnroph. createRondonlndirectéraph(768, 1780; tong piis = r 


Writesrapht Graphs S I beSearchCcraph, 7499; 


m— 
nina TsO 


Systes aut print raph toString); tong end Sosten. curetis 
Syst out prin” done writing Caltech Geaph to File); long use@lenory = CInitiatesory - freno 


Systen.exit(@); 


Syston. out printin((usedenery) / cams 
Syaten.out.printingtine + "niti 

1 
) 
° Bons oves E Code H 
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Analysis and Evaluation 
Despite the fact that the repository that provided me with information about the Facebook relationships of students 
in many universities in the United States, | chose these Universities based on whether they were co-educational, 


in order to alleviate any gender inconsistencies, and most importantly based on how much each university's nodes 


Comparison of the Time used by DFS and BFS to find node 745 


Time for search /millisecs 


Selected Universities 


Graph 1. 


and edges varied from each other in order to establish a huge difference to have significant effects. 

Graph 1 shows that DFS used less time to find node 745 in Caltech, Reed, Haverford, Swarthmore, Bucknell and 
MIT. However, BFS used less time to find node 745 in Middlebury and John Hopkins. Therefore, overall, DFS used 
less time to search for node 745. However, with BFS finding node 745 in the graph structure of 2 schools in lesser 
time compared to DFS, | was forced to explore other factors aside the design of both algorithms such as the 
structure of the graph, the position of the node being searched for in the graph, or anomalies. | retook data for 
the Middlebury and John Hopkins graphs to investigate whether | had an anomaly. However, the data still remained 
consistent in terms of BFS using less time. Due to Jgrapht’s limitations and my programming skills, there was no 


way | could determine the structure of the tree or the position of the node in the implemented graphs of both 


16 


schools. Hence, | found out the number of edges node 745 had in the graph of Middlebury and John Hopkins, and 


Node 745 had 85 and 66 edges respectively. 


From this information, | could not find any reason why BFS had lesser time in finding node 745 however, thinking 
about the approach both algorithms use — i.e. exploring the widest nodes first (as in BFS) and exploring the 
deepest nodes first (as in DFS) — | am hypothesizing that perhaps node 745 was wider in the graphs of both 


Schools, hence BFS took less time. 


Graph 2. 


Comparison of the Memory used by BFS and DFS to find node 745 


mes mon 


| TU CPA. 


Selected Universities 


Memory Used / mb 


The memory used by both algorithms during the search for node 745 was very close and sometimes even had 
the same value. In Graph 2, the universities labelled with the green circle had BFS and DFS use the same amount 
of memory in the execution of the search. However, all the universities in the black ellipses, had DFS use less 
memory compared to BFS. The closeness of the memory used in Reed and Haverford — if we relate it back to 
design of both algorithms — made me speculate that perhaps both BFS and DFS explored the same number of 


nodes. 
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Figure 5 


However, after implementing a DFS and BFS traverse methods shown in fig. 5, | realised that in the graph for 


Reed, BFS explored 481 nodes whereas DFS explored only 133 nodes. 


Surprisingly, in the graph of Haverford, both BFS and DFS algorithms explored 500 nodes each before finding 
node 745. Hence making my speculations not fairly accurate. To answer why both algorithms used up the same 


memory, | think that perhaps memory profiler applications could be used to investigate this. 


Nevertheless, out of curiosity, | decided to find out the number of nodes explored by each search algorithm in the 
graphs of the universities in Graph 2 that had an "explicitly" significant difference when the graph is looked at (i.e. 
Middlebury and MIT). | found out that in the Middlebury graph, DFS explored 304 nodes before finding node 745; 
however, BFS explored an outstanding 1500 nodes before finding node 745. In the graph of MIT, DFS explored 
869 nodes and BFS explored a massive 3000 nodes before finding node 745. Therefore, the number of nodes 


explored could be a reason for the memory size. 


University (BFS) No. of nodes | (DFS) No. of nodes 
explored explored 
Caltech 63 262 
Reed 481 133 
Haverford 500 500 
Swarthmore 1000 1000 
Middlebury 1500 304 
Bucknell 2000 2000 
John Hopkins 2500 2500 
MIT 3000 869 
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However, after finding out the number of nodes explored in all graphs by both algorithms, | realised that my 
speculation of memory size and nodes explored was quite inconsistent with John Hopkins, Haverford, Swarthmore 
and Bucknell, as both algorithms had explored the same number of nodes, but still had DFS use less memory. In 
the case of Caltech, DFS explored more nodes than BFS, however, DFS still used lesser memory compared to BFS 


according to the data shown in Graph 2. 


Even though the number of nodes explored did not have a direct relationship with the memory used by both 
algorithms, it gave me a sense of the position of node 745 in the graphs where the number of nodes explored 
were not same. For example, in the graph of Middlebury, | could predict that node 745 is deeper in the graph 
because it took DFS less memory and time to get to it than it took BFS because BFS was exploring breadthwise 
whereas DFS was exploring depth wise; Therefore, the DFS got to node 745 quicker using less memory than the 
BFS. In the case of Caltech, due to the difference in the number of nodes explored by both algorithms, | could 
predict that in its graph, node 745 was in a much wider position thus, BFS explored less nodes and got to it. 
Intuitively, if BFS explored less nodes, then it should be faster than DFS in finding node 745. However, according 


to graph 1, DFS took less time to find node 745. 


Due to inconsistencies both in Graph 1 and Graph 2, | think a major flaw in my design is that fact that | did not 
have any method that could provide information on the position of node 745 or the structure of the graph 


implemented. Such an inclusion could have given better information to draw suitable conclusions from. 


Possible extensions of this paper could include implementing one graph structure with a set number of nodes and 
randomized edges, and finding randomized nodes in the graph using DFS or BFS. Another extension of this paper 
could be the implementation of a graph that models an actual real-world data set of people's relationships, and 


finding out whether there might be values to do with time and memory that are more consistent. 


Conclusions 

According to my data, Depth first Search can generally be used for search in social graph data structures when 
the priorities of the organization implementing the system has to do with conserving memory and time. However, 
breadth first search can be used for search when such metrics are not priorities. Concerning the number of nodes 
explored and memory used, depth first search is mostly efficient. However, it might not always be the case as 
reflected by my data. Nevertheless, even though the time used by both search algorithms is dependent on the 
path they take —that is, whether deep first or wide first- there might still be other factors that are beyond the 


scope of this paper that are responsible for determining how much time and memory can be used by DFS or BFS. 
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Appendices 


Appendix 1 


Breadth first search Depth first search 
No. of Nodes NO. of Edges| memory/mb | Time /ms memory/mb | Time /ms 

ns iili | tiala | tril | sume wai | i2 | ils | avg. Time 
University 
[Caltech 369 | 17000 | 118717440) 77 78 [3 Ber 118716416| 64 @ EJ 61.00 
Reed 56: | 19000 |117965824| ss 78 n 65.00 117965824| ss L1 56 $067 
‘Haverford | 1000 | 60000 | 13484544Q| 146 384 3i [| 15500 132119552| 136 E 13i | 18533 
‘Swarthmore| 2000 | GL000 [134251520| 166 166 xy | 36967 134250406| — 158 m 359 [24600 
(Middlebury | 3000 | 125000 |109386752| 162 E 187 f 315500 106659840 
[Bucknell A00 | 159,000 | 94415872| 184 185 um f 18200 
ohnhopkins S000 | 187,000 | 87044096 | a61 Th 78 f 78433 
MT 000 | 251000 | 173066240 | 877 1052 sw  [ 93700 


this shows the data table that was used in plotting Graph 1 and Graph 2. 


Appendix 2 
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This shows the repository from which | took the number of nodes and edges from according per university 


Facebook data. 


Appendix 3. 


public static void WriteGraph(Graph<Integer, DefaultEdge> Graph, String name) { 


try t 
FileOutputStream file = new FileOutputStream(nome«" . txt"); 


ObjectOutputStream write = new ObjectOutputStrean(File); 
write.writeObject(Graph); 
write. closeC 
} catch (IOException e) { 


// TODO Auto-generated catch block 
e.printStackTrace(); 


H 


SuppressWarningsC"unchecked") 
public static Graph<Integer, Defaulttdge» ReadGraph(Graph«Integer, DefaultEdge» Groph , String nome) 
throws ClossNotFoundException { 
try ( 
FileInputStream fileIn = new FileInputStream(name+". txt"); 


ObjectInputStream read = new ObjectInputStream(FileIn); 
Graph = Corg.igrapht.Graph<Integer, DefaultEdge>) read. read0bject(); 
read.close(); 

} catch (IOException e) { 

// 1000 Auto-generated catch block 
e.printStackTrace(); 

Ë 


return Graph; 


this shows my write graph and read graph method used in my program. 
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Appendix 4. 


Q caltech|txt 


DOCUMENTS 
É file 46622 (1).doc 


EÉ| file 46622.doc 
SIRI SUGGESTED WEBSITES 


@  caltech.edu 
^  ems.caltech.edu 


®  gps.caltech.edu 
WIKIPEDIA 


© California Institute of Technology 
WEB VIDEOS 


W Ripples of Gravity, Flashes of Light 
WHI The Sound of Two Black Holes Coll... 


G Variety of Gravitational Waves and... 
MAPS 


© Caltex 


} 


“İsrorg. jgrapht.graph.SimpleGraph15360460xr&org. jgrapht.gr 
aph.AbstractBaseGraphÜxóVu?*Z 
allowingLoopsZallowingMultipleEdgesl 
edgeFactorytLorg/jgrapht/EdgeFactory; LedgeMaptl java/util/ 
Map;L — specificst'Lorg/jgrapht/graph/specifics/ 
Specifics;xpsr'org. jgrapht.graph. ClassBasedEdgeFactory2636 
7798L — edgeClasstLjava/lang/ 

Class;xpvrorg. jgrapht.graph.DefaultEdge-8195227xrorg. jgrap 
ht. graph. IntrusiveEdge-8195227LsourcetL java/lang/ 
Object;Ltargetq- 

xpsrjava.util.LinkedHashMap4iNN LZ 
accessOrderxrjava.util.HashMap/iv —F 

loadFactorI thresholdxp?g" wABhsq~ 

srjava. lang. Integer, t§”Ad8Iva luexrj ava. lang.NumberU "i 
itàxpesq-.q-sq- sq~—sq~iq~sq~ — sq~-sq~Jq~sq~ 
Sqesq-Dq«sq- — sqrfsq~Nq~sq~ sq-"sq=kq=!sq= sq— 

Sq q-Ssq- SqMüsqeeq-'sq« sq~Asq-vq~wsq~ sq~sq~Sq~— 
Sq- Sq-Lsq= q=0sq=_ sq-—sq-q-3sq- — sq~dsq~ng~6sq~ 
Sq=Asq=ëq=9sq= sq~sq~Eq~<sq~ sq-Msq=-q=?sq= 
sq~-Isq~<q~Bsq~ sq~"sq~qq~Esq~ sq~sq~tq~Hsq~ 
sq~nsqvéq~Ksq~ 9 sqvesq~ca~Nsq~ sq~()sq~eq~Qsq~ 
sq~"qrRq~Tsq~ sq.Üsq=q=Vsq= sq.$sq=(q=Ysq= 
sq~+sq~ta~\sq~ sq~Csq~iq~_sq~ sq~isqutqybsq~ sq-*sq=? 
qvesq~ sq~isq~4a~hsq~ sq~/sq~fa~ksq~ sq~}sq~"q~nsq~ 
Sq-€sq=°q=qsq= qwZsq-«q-tsq« — sq-Vsq-!q-vsq= 
Sqv'sqenüvysq« sq~"sq~veq~|sq~ sq-Ysq="qssqv 


sh HD + Users + NKAY » Desktop » stuff » ExtendedEssay copy » Caltech 


Q. reed xt 


DOCUMENTS 


Reed.txt 
SIRI SUGGESTED WEBSITES. 


reed.co.uk 
& Show all in Finder... 


“tsrorg. jgrapht.graph.SimpleGraph15360460xr#org. jgrapht.gr 
aph.AbstractBaseGraphOxd¥u?+Z 
allowingLoopsZallowingMultipleEdgesL 

edgeFactorytLorg/ jgrapht/EdgeFactory;LedgeMaptLjava/util/ 
Map;L —specificst'Lorg/jgrapht/graph/specifics/ 
Specifics;xpsr'org. jgrapht. graph. ClassBasedEdgeFactory2636 
7798L — edgeClasstLjava/lang/ 

Class;xpvrorg. jgrapht.graph.DefaultEdge-8195227xrorg. jgrap 
ht.graph.IntrusiveEdge-8195227LsourcetLjava/lang/ 
Object;Ltargetq* 

xpsrjava.util.LinkedHashMap4iN\ LL Z 
accessOrderxrjava.util.HashMap/iV-F 

loadFactorl thresholdxp?e wÁJ8sq 

srjava. lang. Integer, 15" ÁáBIvaluexrjava. lang.NumberÜ^ i 
itàxpsq-iq-sq- sqisq-Bq-sq- — sqpisq- 

q-sq-  sq-sqMÁqesq- — sq» sqeda-sq- sq-Ísq¿q=!sq= 
SqMWsq«zq-$sq- sq-Psq-)q-'sq- sq~wsq~q~esq~ sq-.sq=/ 
Qq—5sq- sq-àsq-q-0sq- Sq=£sq-Nq=3sq= sq~tsq~Gq~6sq~ 
Sq--sq-q«9sq- —sq-@sq=tq=<sq= sqösq~q~?sq~ 
Sq=ësq=*q=Bsq= sq=#sq-Ñq=Esq= sqesq~Äq~Hsq~ 
sq~"sq~Eq~Ksq~ sq 

sq=flq-Nsq= Sq=zsq=eq=Qsq= sq~sq~aq~Tsq~ 
Sq=-8sq-Wq=Wsq= sq sq=;q=Zsq= sq-ésq-Io-]sq- 
sq-«usqíqw sq~ sq~.sqr#q~csq~ sq~Rsqvég~fsq~ 
sq-msq~,q~isq~ sq-.sq-Ñq=lsq= sq-ásq=vq-osq= 
Sq«,sqMügersq- sq~fsqvdqvusq~ sq-Ssq-Éq-xsqx 
sq~Bsq-Sq~{sq~ sq-Gsq-qe-sq-« — sqresq~?q~Asq~ 
sq~dsqu®a~Nsq~ sq~.sqvligwdsq~ sq~{sq~q~asa~ 
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Q. haverford.txt 


DOCUMENTS 


WIKIPEDIA 


© Haverford College 
SIRI SUGGESTED WEBSITES 


Ë haverford.edu 
? haverford.org 
Z en.wikipedia.org 
News 


Q WATCH: Springfield, Haverford hig... 
MAPS. 


€. Haverford, PA, United States 
WEB VIDEOS 


B Animaniacs innuendos 
Tom Haverford - No! 


Dennis Prager Interviews Haverfor.. 
DEVELOPER. 


Q. swarthmore.txt 


DOCUMENTS. 


© Swarthmore College 
SIRI SUGGESTED WEBSITES. 


ES swarthmore.edu 
=  swarthmorepa.org 


/& | swarthmoreathletics.com 
WEB VIDEOS 


WW Swarthmore Campus Tour 
MAPS 


(f) Swarthmore, PA, United States 
DEVELOPER 


[5] jgraphtest.java 
SPREADSHEETS 


Ñ Extended Essay primary research sq~:sqryq~Esq~ sq~,sq~|a~Usq~ sq-Asq-Ëq-ásq= sq— ^ 


] | 
“tsrorg. jgrapht.graph.SimpleGraph15360460xrgorg. jgrapht.gr 
aph.AbstractBaseGraphÜxóVu?«Z 
allowingLoopsZallowingMultipleEdgesL 

edgeFactorytLorg/ jgrapht/EdgeFactory;LedgeMaptLjava/util/ 
Map;L — specificst'Lorg/jgrapht/graph/specifics/ 
Specifics;xpsr'org. jgrapht.graph. ClassBasedEdgeFactory2636 
7798L — edgeClasstLjava/lang/ 

Class;xpvrorg. jgrapht .graph.DefaultEdge-8195227xrorg. jgrap 
ht.graph.IntrusiveEdge-8195227Lsourcetljava/lang/ 
Object;Ltargetq- 

xpsrjava.util.LinkedHashMap4iNN L*Z 
accessürderxrjava.util.HashMap/iV^-F 

loadFactorI thresholdxp?@Awt* sq~ 

srjava. lang. Integer, t$" AáBIvaluexrjava. Lang.NumberU"i 
itáxpsq-ja-sq- sq~Esq~Hq~sq~ — sq«sqe qsq 
sq~asq~hq~sq~ — q-sq-q«sq» sq="sq-[la= sq- 
Ssq=Ysq=Lq=#sq= sq-*sq==q=&sq= sqv-àsq-q-)sq- 
sq-esq-Üq»,sq-« sq-3sq--qv/sq« sq~~sq~Ga~2sq~ 
Sq«sq^cq-5sq-« — sq-Bsq-q-Bsq-« — sqekqpAqe;sqe  sqvisq~ 
q-sq« sqvEsq~Xq~@sq~ sq~$sq~uq~Csq~ sq~{sqviq~Fsq~ 
sq~Osq~)a~Isq~ sq-0q-Eq-Lsq- sq-ósq-üq-Nsq= 
sqw-sq~la~0sq~ sa~{sq~"a~Tsq~ sq~ésq~ q~Wsq~ 
q~Psq~q~Zsq~ sq~/sq~Bq~\sq~ sq~sq~wq~_sq~ 
sq~ésq~ca~bsq~ sq==sq=+q=esq= sq-fsq=:q=hsq= g~Jsq~.. 
qrksq~ sqe-sqMfiqumsq» sq~#sq~Pq~psq~ sq~?sq~Eqwssq~ 


Sq«sqeq-vsq« sq-&sq=Ëqysq= sq~Usq-dq~|sq~ sq 
sq~?qusq~ sqvosq~'q~Csq~ sq.*s sq 
sq-Ksq-qàsq- sq-Vsq=zq-Šsq sq-6sq-4q-ésq— 


J | 
“tsrorg. jgrapht.graph.SimpleGraph15360460xr£org. jgrapht.gr 
aph.AbstractBaseGraphOxo¥u?+Z 
allowingLoopsZallowingMultipleEdgesL 
edgeFactorytLorg/jgrapht/EdgeFactory;LedgeMaptl java/util/ 
Map;L —specificst'Lorg/jgrapht/graph/specifics/ 
Specifics;xpsr'org. jgrapht.graph. ClassBasedEdgeFactory2636 
7798L — edgeClasstLjava/lang/ i 
Class;xpvrorg.jgrapht.graph.DefaultEdge-8195227xrorg.jgrap 
ht.graph.IntrusiveEdge-8195227LsourcetLjava/lang/ 
Object;Ltargeta- 


xpsrjava.util.LinkedHashMap4¿NNl¿'Z i 
access0rderxrjava.util.HashMap/ ivV`—F 
loadFactorI thresholdxp?@AwOHsq~ ^ 


srjava. lang. Integer, t$" ÁáBIvaluexrjava. Lang.NumberÜ" i 
itüxpIsqeeqesqw sq~!sq~ 

q«sq«  sq9sqesqesq« — sq~esqrig~sq~ sq-úsqflq=sqv 
Sq-0sq-9q»«!sq- sq~ysq~zq~$sq~ sq-Vsqeqv'sq- sq! 
sq’ qosqo Sq~?sq-#q~-sq~ sq-Ksq-ja-9sq- 
sqrnsqwaq~3sq~ sq-wsq“q=6sq= sq-Ísq"q-9sq= 
Sq^sq-iq--sq- sq-Usq-9q=?sq= sq-əsq="q=Bsq= 
sq~xsqviq-Esq~ sq. sq qeHsq» sq», sq Üg-Ksq^- 
Sq=>sq=fasNsq= sq-4q-q-Qsq- sqvssq~vog~Ssq~ 
sq~ésq~Eq~Vsq~ sq~lsq~ q=Ysq= sq~Usq~Aiq~\sq~ 
sq-sq^&q- sq« sq~ésq-Ag~bsq~ sq-lsq 

Qq-esq- sqrisq~'q~hsq~ = sq~¥sq~iq~ksq~ sq~Msq~q~nsq~ 
sq~ésq~Ha~gsq~ sq~sqwOq~tsq~ — sqesq-!qewsa« — sq» 
sq~Xq~zsq~ Sqwísq-àq-)sq- sq-<sqwq=Ásq= 
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Q. Middlebury.txt 


DOCUMENTS 


Middl 
WIKIPEDIA 


© Middlebury College 
SIRI SUGGESTED WEBSITES. 


HR middlebuny.edu 
=  athletics.middlebury.edu 


**  nndb.com 
"WEB VIDEOS: 


WM& Middlebury College protest against... 
BH Life at Middlebury 


Protesters confront scholar at Mid... 
"WEBSITES. 

@  usnews.com 

‘SPREADSHEETS 


Ë Extended Essay primary research 
POF DOCUMENTS 


] 


“tsrorg. igrapht.graph.SimpleGraph15360460xr#org. jgrapht.gr 
aph. AbstractBaseGraphOxd¥u?+Z 
allowingLoopsZallowingMultipleEdgesL 
edgeFactorytLorg/jgrapht/EdgeFactory;LedgeMaptLjava/util/ 
Map;L — specificst'Lorg/jgrapht/graph/specifics/ 
Specifics;xpsr'org. jgrapht.graph. ClassBasedEdgeFactory2636 
7798L — edgeClasstLjava/lang/ 

Class;xpvrorg. jgrapht.graph.DefaultEdge-8195227xrorg. jgrap 
ht. graph. Intrus ive€dge-8195227Lsourcetl java/lang/ 

Object; Ltargetq~ 

xpsrjava.util.LinkedHashMap4iN\ lé°Z 
accessOrderxr java. util.HashMap/iv^-F 

loadFactorl — — thresholdxp?ewEHsq- 

srjava. lang. Integer, t$" AáBIvaluexrjava. lang.NumberÜ^ i 
itàxpasq«'q-«sq« sqèsq~ «q«sq« sq~ 


“squUqwsq~ sq-msqvig~sq~ sqrosqrligq~sqr sqósq~ 
Nq-!sq» sq-osqy 

q$sq« q-&sqMq-'sq« sq 

isqe?q»)sq- sq- =sq-Nq~, Sq sq=Ásq=ba=/sq= 
sq~ésq~ 

20-25q sq~ 

sq—q-5sq= Sq-Usq»Wq-Bsq-« sq-vsq=aq=;sq- sq~sq~ 


&q=>sq= sq=Fsq= 

Q=Asq= sq-ósq-Sq-Dsq- 9 sq~Bsq~ Gq=Gsq= sq=dsq=.q=Jsq= 
sq=?sqx 

qMsq- sq-(sq= 

@q~Psq~ sq-sq= 

Q=Ssq= sq-;sq> 


Q  bucknelltxt 


Philosophy. txt 


^ Humanities.txt 
WIKIPEDIA. 


© Bucknell University 
wes VIDEOS 


EH Bucknell Professor Threatened His... 
EE Milo Yiannopoulos Speaks at Buck... 


WE Michigan vs Bucknell waterpolo 2017 
SIRI SUGGESTED WEBSITES 


EE bucknell.edu 
^  bucknellbison.com 


EE bucknell.edu 
WEBSITES 


@  usnews.com 


q-üsq- sq-sq-Hq-Tsq« — sq-Qsq-Hq-Wsq— _sq~esq~q~Zsq~ 


} 


“Isrorg.jgrapht.graph.SimpleGraph15360460xr#org. jgrapht.gr 
aph. Abst ractBaseGraphOx6¥yu?+Z 
allowingLoopsZallowingMultipleEdgesL 

edgeFactorytLorg/ jgrapht/EdgeFactory; LedgeMaptLjava/util/ 
Map;L — specificst'Lorg/jgrapht/graph/specifics/ 
Specifics;xpsr'org. jgrapht.graph.ClassBasedEdgeFactory2636 
7798L — edgeClasstLjava/lang/ | 
Class;xpvrorg. jgrapht.graph.DefaultEdge-8195227xrorg.jgrap | 
ht. graph. IntrusiveEdge-8195227LsourcetLjava/lang/ 

Object; Ltargetq~ 

xpsrjava.util.LinkedHashMap42NN LZ "Z | 
accessOrderxrjava.util.HashMap/iv`—-F 

loadFactorI thresholdxp?@wmsq~ a | 
srjava. lang. Integer, 187 AáBIvaluexrjava.lang.NumberÜ i 
itaxpysq~Sq~sq~ sq~Osq~ àg«sq» sq ásq- gesoe 


Sq-"sqMYq-sq» — sqexsqeqesq« — sq.Ésq-#q!sq= sq 
Nisq~fq~ssq~ sq~ 

isq- 

\q~'sq~ sa~ 

Qsq~_quesq~ sqvisq-ca~—sq~ sq-Gsq=-q=0sq— 


sq~usqveq~-3sq~ sqvesq~v 
Eqvésq~ sq=]sq= >q~9sq~ sq 
&sq=<q=<sq= sq= 


tsq~hq~?sq~ sq~ ^sqe 
qBsq- sq-3sqeoqEsq- sq 
Sq-Sq-Hsq sq~msq~ 


ignksq~ sq~sq~lq-Nsq~ sq-ésq= 
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Q. hopkins.txt! 


lk — Show all in Finder... 


Q MIT.TXT 


DOCUMENTS 


LICENSE-MIT.txt — htmi-to-text 
LICENSE-MIT.txt — tough-cookie 
LICENSE-MIT.txt 
LICENSE-MIT.txt — dkim-signer 
LICENSE-MIT.txt 


/ 


"Isrorg. jgrapht.graph.SimpleGraph15360460xr&org. jgrapht.gr 
aph.AbstractBaseGraphÓxóWu?«Z 
allowingLoopsZallowingMultipleEdgesL 
edgeFactorytLorg/jgrapht/EdgeFactory; LedgeMaptLjava/util/ 
Map;L — specificst'Lorg/jgrapht/graph/specifics/ 
Specifics;xpsr'org. jgrapht.graph. ClassBasedEdgeFactory2636 
7798L — edgeClasstLjava/lang/ | 
Class;xpvrorg. jgrapht.graph.DefaultEdge-8195227xrorg. jgrap 
ht. graph. IntrusiveEdge-8195227LsourcetLjava/lang/ 
Object;Ltargeta~ 

xpsrjava.util.LinkedHashMap4ZNN LL *Z. | 
accessOrderxrjava.util.HashMap/iv*—F 

loadFactorl ^ thresholdxp?@w/xsq= | 
srjava. lang. Integer, t$" ÁáBIvaluexrjava. lang.NumberU"i 


itàxp(sq-[q"sq- sq~nsq~aq~sq~ — sqesq^Yqesq»  sq~ 
25q—q»5q sq=sq= 

=q=sq= sqecsq-!qe!sqe sq~ 

Ssq=Àq=$sq= sq- —sq=Ca='sq= sq- 
Isq=pq=*sq= Ssq= sq 

$q—5sq- Sq~—sq~?q~Osq~ sq-esq$q-3sq> sq~tsq~ a~6sq~ 
EN 

isq-sq-9sq-  sqwxbsqviq=<sqx sq«osq* 

"q-?sq- sq~fsq~ng~Bsq~ sq~sq~ 

Oq-Esqw sq 

"sq-Gq-Hsq- — sq- 

fsq~¢q~Ksq~ sq- 

Ssq~q-Nsq~ sq~ 

asq==q=0sq= sSq=esq= ñq=Tsq= sq-Ósq= 4 


] | 
“Isrorg.igrapht.graph.SimpleGraph15360460xr#org. jgrapht.gr 
aph.AbstractBaseGraphOxo¥y?+Z 
allowingLoopsZallowingMultipleEdgesL 
edgeFactorytLorg/jgrapht/EdgeFactory; LedgeMaptL java/util/ 
Map;L — specificst'Lorg/jgrapht/graph/specifics/ 
Specifics;xpsr'org. jgrapht.graph.ClassBasedEdgeFactory2636 
7798L — edgeClasstLjava/lang/ 

Class;xpvrorg. jgrapht.graph.DefaultEdge-8195227xrorg. jgrap 
ht.graph.IntrusiveEdge-8195227LsourcetLjava/lang/ 
Object;Ltargetq 

xpsrjava.util.LinkedHashMap4éN\ LL *Z 
accessOrderxr java. util.HashMap/iv^-F 

loadFactorl ^ — thresholdxp?ew'xsq 


MAPS srjava. lang. Integer, t$" ÁáBIvaluexrjava. lang.NumberÜ^i 
= itšxp 
‘= Myths 2 KTV & Disco Pub Sq-Uq=sq= sq- ?sq=xq=sq= sq- 7sq=áq=sq= 
DEVELOPER Ssq=sq= 
: 4q-sq- sq-fsq=#q=sq= — sqMÜsq-q-!sq- _sq~Ssq~flq~$sq~ 
Æ jquery-flot.navigate.js Sq-ysq-Jq»'sq« sq. sq» qsg sq-rsqe^qe-sq« sq~/sq~ 
x 5 qesq« sq- *sq-q-3sq sq- 
Œ validatejs — com.adobe.experimen.. ^ sq-«Gq-6sq- sqMóq- q~9sq~ — sqesqeJqe;sqe sq 
"sqw]qesq- — sq sqwaqMAsq- sq=8sq q"Dsq= 
Ë  links.js — com.adobe.experimentati... sqesq«ja»Gsq« sq~ 
ásqMfiqpJsq- — sq» — csqéqMsq-« — sq=isq~fq~Psq~ 
!& linksjs — jsprim sq~sq~eq~Ssq~ sq 
pu dsqvig-Vsq~ — sqeisq«q-Ysq« — sq~sq~xq~\sq~ — sqe&sq- 
!&  linksjs Cd sq» sq-sq-ia-bsq sq~ 
isq~(qvesq~ sq- úsq~úq~hsq~ sq-lsq=iq=ksq= ] 
— P 


These show all the files generated. 
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